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ABSTRACT 

Many of the cosmological tests to be performed by planned dark energy experiments will require extremely 
well-characterized photometric redshift measurements. Current estimates for cosmic shear are that the true 
mean redshift of the objects in each photo-z bin must be known to better than 0.002(1 +z), and the width of 
the bin must be known to ^ 0.003(1 + z) if errors in cosmological measurements are not to be degraded sig- 
nificantly. A conventional approach is to calibrate these photometric redshifts with large sets of spectroscopic 
redshifts. However, at the depths probed by Stage III surveys (such as DES), let alone Stage IV (LSST, JDEM, 
Euclid), existing large redshift samples have all been highly (25-60%) incomplete, with a strong dependence of 
success rate on both redshift and galaxy properties. A powerful alternative approach is to exploit the clustering 
of galaxies to perform photometric redshift calibrations. Measuring the two-point angular cross-correlation be- 
tween objects in some photometric redshift bin and objects with known spectroscopic redshift, as a function of 
the spectroscopic z, allows the true redshift distribution of a photometric sample to be reconstructed in detail, 
even if it includes objects too faint for spectroscopy or if spectroscopic samples are highly incomplete. We 
test this technique using mock DEEP2 Galaxy Redshift survey light cones constructed from the Millennium 
Simulation semi-analytic galaxy catalogs. From this realistic test, which incorporates the effects of galaxy bias 
evolution and cosmic variance, we find that the true redshift distribution of a photometric sample can, in fact, 
be determined accurately with cross-correlation techniques. We also compare the empirical error in the recon- 
struction of redshift distributions to previous analytic predictions, finding that additional components must be 
included in error budgets to match the simulation results. This extra error contribution is small for surveys 
which sample large areas of sky (>^10-100 degrees), but dominant for ~ 1 square degree fields. We conclude 
by presenting a step-by-step, optimized recipe for reconstructing redshift distributions from cross-correlation 
information using standard correlation measurements. 

Subject headings: galaxies: distances and redshifts — large-scale structure of the universe — surveys — cos- 
mology: observations 



1. INTRODUCTION 

For many years it was thought that the expansion of the 
universe should be slowing due to the gravitational attrac- 
tion of matter, but measurements of Type la supernovae and 
other observations have shown that the expansion rate is i n 
fact accelerating (iRiess et al.1 Il998t iPerlmutter et al.1 Il999h . 
This accelerating expansion is generally attributed to an un- 
known component of the energy density of the universe com- 
monly referred to as "dark energy." One of the goals of 
future cosmolog ical probe s (e.g. LSST, JDEM, and Eu- 
clid) tTvson & Ange l 200ll: lTvsonl2"005tlAlbrecht et al.l2009l 
iBeaulieu et al. 2010l) . is to determine constraints on dark en- 
ergy equation of state parameters, e.g. w = P / p and Wa = 
dw/da (Johri & Rath 2007), where P is the pressure from 
dark energy, p is its mass density, and a is the scale factor 
of the universe (normalized to be 1 today). 

In order for these experiments to be successful, we re- 
quire information about the redshift of all objects used to 
make measurements. However, it is impractical to measure 
spectroscopic redshifts for hundreds of millions of galaxies, 
especially extremely faint ones. We can measure the red- 
shift of many more objects from photometric information, 
e.g. by using a large set of spectroscopic redshifts to create 
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templ ates of how color varies with redshift (IConnoUv et al.l 
Il995h . However current and future spectroscopic surveys 
will be highly incomplete due to s election biases depe ndent 
on redshift and galaxy properties d Cooper et al.ll200q) . Be- 
cause of this, along with the catastrophic ph otometric errors' 
that can occur at a signific ant (^ 1%) rate dSun et al. lHooi 
iBernstein & Hutererll201Cft . photometric redshifts are not as 
well understood as redshifts determined spectroscopically. If 
future dark energy experiments are to reach their goals, it is 
necessary to develop a met hod of calibrating photometric red- 
shifts with high preci sion (lAlbrecht et al.ll2006l: iHuterer et al.l 
l2006t iMa et~al]|2006l) . Cun-ent projections for LSST cosmic 
shear measurements estimate that the true mean redshift of 
objects in each photo-z bin must be known to better than 
- 0.002(1 +z) (Zha n & Knoxll2006t IZhanll200d iKnox et al.l 
,2006i',Tvson200 6) with stringent require ments on the fractio n 
of unconstrained catastrophic outliers dHearin et al.l 1201 Ol) . 
while the width of the bi n must be known to ^ 0.003(1 + z) 
(LSST Science Collaborati ons: Paul A. Abell et al.ll2009l) . 

In this paper we test a new technique for calibrating photo- 
metric redshifts measured by other algorithms, which exploits 

' such as contamination from overlapping or unresolved objects; this is a 
frequent problem in deep surveys, particularly at high redshifts, cf. Newman 
etal. 2010 
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the fact that objects at similar redshifts tend to cluster with 
each other. If we have two galaxy samples, one with only pho- 
tometric information and the other consisting of objects with 
known spectroscopic redshifts, we can measure the angular 
cross-correlation between objects in the photometric sample 
and the spectroscopic sample as a function of spectroscopic 
z. This clustering will depend on both the intrinsic clustering 
of the samples with each other and the degree to which the 
samples overlap in redshift. Autocorrelation measurements 
for each sample give information about their intrinsic cluster- 
ing, which can be used to break the degeneracy between these 
two contributions. The principal advantage of this technique 
is that, while the two sets of objects should overlap in redshift 
and on the sky, it is not necessary for the spectroscopic sample 
to be complete at any given redshift. Therefore it is possible 
to use only the brightest objects at a given z, from which it is 
much easier to obtain secure redshift measurements, to cali- 
brate photometric redshifts. Even systematic incompleteness 
(e.g. failing to obtain redshifts for galaxies of specific types) 
in the spectroscopic sample is not a problem, so long as the 
full redshift range is sampled. This method is effective even 
when the two samples do not have similar properties (e.g. dif- 
fering luminosity and bias). 

We here describe a complete end-to-end implementation 
of cross-correlation methods for calibrating photometric red- 
shifts and present the results of applying these algorithms to 
realistic mock catalogs. Throughout the paper we assume a 
flat ACDM cosmology with ri„,=0.3, ^Ik=Q.1, and Hubble pa- 
rameter Ho — 100/z km s~^ Mpc~\ where we have assumed 
/i=0.72, matching the Millennium simulations, where it is not 
explicitly included in formulae. In ^we describe the catalog 
and data sets used to test cross-correlation methods. In ^we 
provide a description of the reconstruction techniques used in 
detail, and in ^we provide the results of the calculation. In 
^we conclude, as well as give a more concise description of 
the steps taken, providing a recipe for cross-correlation pho- 
tometric redshift calibration. 

2. DATA SETS 

To test this method, it is necessary to construct two sam- 
ples of galaxies, one with known redshift ("spectroscopic") 
and the other unknown ("photometric"). We have done this 
using mock DEEP2 Redshift Survey light cones produced by 
Darren Croton. A total of 24 light cones were constructed by 
taking li nes-of-sight through the Millennium Simulation halo 
catalog ("Lemso n & Virgo Consortiuml20 06') with the redshift 
of the simulati on cube used increasin g with distance from the 
observer dKitzbichler & White! I2OO7I) . The light cones were 
then populated with galaxies using a semi-analytic model 
whose par ameters were chos en to reproduce local galaxy 
properties (ICroton et al.ll2006l) . Each light cone covers the 
range 0.10 < z < 1.5 and corresponds to a 0.5 x 2.0 degree 
region of sky. The galaxies in this mock catalog will have 
properties (including color, luminosity, and large-scale struc- 
ture bias) which vary with redshift due to the same factors 
believed to affect real galaxy evolution. The semi-analytic 
model used is certainly imperfect, but yields samples of galax- 
ies that pose the same difficulties (e.g. bias evolution and dif- 
ferences in clustering between bright and faint objects) as real 



surveys will exhibit; they therefore provide a realistic test of 
our ability to reconstruct redshift distributions of faint sam- 
ples using spectroscopy of only a brighter subset. 

The spectroscopic sample is generated by selecting 60% 
of objects with observed /?-band magnitude R < 24.1, 
which gives a sample whose characteristics resemble the 
DEEP2 Galaxy Redshift survey (Newman et al. 2010, in 
prep.). The mean number of spectroscopic objects over 
the 24 light cones is 35,574. The size of this sample 
is comparable to the number of objects predicted to be 
nee ded for calibration using template-based methods (^ 
10^ (ILSST Science Collaborations: Paul A. Abell etaLll2009l 



iMa & Bernsteinll2008b ). However, this sample differs greatly 
in what it contains: it consists only of relatively bright ob- 
jects, rather than having to be a statistically complete sam- 
ple extending as faint as the objects to which photomet- 
ric redshifts will be applied (a necessity for accurate train- 
ing or template development, as the spectral energy dis- 
tributions of faint galaxies are observed to lie outside the 
range luminous galaxies cover, b oth at z ~ an d z ^ 1 
fWiflmer et al. 2006; MacDonald & Bernsteinll2010l) . Studies 
such as iBernstein & Hutereil ( 120 lOh have assumed for such 
projections that 99.9% redshift success can be achieved for 
faint galaxy samples (e.g. of photometric -redshift outliers); 
however, that is a failure rate more than two orders of magni- 
tude lower than that actually achieved by curre nt large surveys 
on 10 -meter class t elescopes such a s VVDS (iLe Fevre et al.l 
l2005h . ZCOSMOS (' LiUv et aLlbOO? *). or DEEP2 (Newman et 
al. 2010, in prep.), surveys which are 1.5-5 magnitudes shal- 
lower than the limits of Stage III and Stage IV surveys such 
as DES and LSST. In contrast, as noted in ^ the cross-cor- 
relation techniques we focus on in this paper do not require a 
complete spectroscopic sample, and hence do not require im- 
provements in redshift success over existing projects to pro- 
vide an accurate calibration. 

The other sample, referred to hereafter as the photometric 
sample, is constructed by selecting objects in the mock cata- 
log down to the faintest magnitudes available, with the prob- 
ability of inclusion a Gaussian with (z) = 0.75 and — 0.20. 
This emulates choosing a set of objects which have been 
placed in a single photometric redshift bin by some algorithm 
with Gaussian errors. It should be noted that, since the red- 
shift distribution of the mock catalog we select from is not 
uniform, the resulting redshift distribution of the photometric 
sample is not a pure Gaussian. The overall redshift distribu- 
tion of all objects in the catalog is fit well using a 5th degree 
polynomial, so the net distribution of the photometric sam- 
ple can be well represented by the product of this polynomial 
and a Gaussian. After applying this Gaussian selection to the 
mock catalog, we then randomly throw out half of the selected 
objects in order to cut down on calculation time. The mean 
number of objects in the final photometric sample over the 24 
light cones is 44,053. 

The mock catalog includes both the cosmological redshift 
as well as the observed redshift for each object. The ob- 
served redshift shows the effects of redshift-space distortions 
( jHamilton 1998 ), and is the redshift value used for objects in 
the spectroscopic sample. When plotting the redshift distribu- 
tion of the photometric sample we use the cosmological red- 
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Fig. 1. — The total number of galaxies in each sample as a function of 
redshift, summed over the 24 fields, binned with Az = 0.04. The solid line 
is the overall redshift distribution for all galaxies in the mock catalogs, the 
dashed line is the distribution for our photometric sample (selected from the 
overall sample via a Gaussian in z, emulating objects placed in a single pho- 
tometric redshift bin), while the dot-dashed line is the redshift distribution for 
our spectroscopic sample, selected to have magnitude R < 24.1. 



shifts for each object (differences are small). Fig. [T]shows the 
number of galaxies as a function of redshift for each sample, 
as well as the entire catalog. While there is complete infor- 
mation on the actual redshift distributions for both samples 
in the catalog, only the distribution of the spectroscopic sam- 
ple is assumed to be known in our calculations. We assume 
no information is known about the redshift distribution of the 
photometric sample, and attempt to recover it using only cor- 
relation measurements. 

3. METHOD 

After constructing the two samples of objects from each 
mock catalog, we can use standard correlation measurements 
and exploit the clustering of galaxies to recover the redshift 
distribution of the photometric sample. From here on, the 
spectroscopic sample, with known observed redshifts, will 
be labeled 's\ and the photometric sample, with redshifts as- 
sumed unknown, will be labelled '/?'. 

The most fundamental correlation measurements we use are 
the real space two-point correlation function and the angular 
two-point correlation function. The real space two-point cor- 
relation function ^(r) is a measure of the excess probability 
dP (above that for a random distribution) of finding a galaxy 
in a v olume dV, at a separation r from another galaxy dPeeblesI 
Il980h : 

dP = n[l+^{r)]dV, (1) 

where n is the mean number density of the sample. The angu- 
lar two-point correlation function w{d) is a measure of the ex- 
cess probability dP of finding a galaxy in a sol id angle 17, at 
a separation 6 on the sky from another galaxy ( |Peebleslll980l) 



dp^^i+w{e)]dn, (2) 

where S is the mean number of galaxies per steradian (i.e., the 
surface density). From the spectroscopic sample we measure 
the real space two-point autocorrelation function, ^ss{f,z), and 
from the photometric sample we measure the angular two- 
point autocorrelation function, Wpp{d). These measurements 
give information about the intrinsic clustering of the samples. 
We also measure the angular cross-correlation function be- 
tween the spectroscopic and photometric sample, Wsp{6,z), as 
a function of redshift. This is a measure of the excess proba- 
bility of finding a photometric object at an angular separation 
9 from a spectroscopic object, completely analogous to Wpp. 

Modeling f (r) as a power law, ^(r) = {r/rQ)~^ , which is an 
accurate assumption from ^ 0.5 to ~ 20/z~' comoving Mpc 
for both observed samples and those in the mock catalogs, we 
can determine a relation between the angular cross-correlation 
function Wsn (9,z) and t he red shift distribution. Following the 
derivation in lNewma nl (l2008h (cf. eq. 4), 



dl/dz 
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where 7/(7) = r(l/2)r((7- l)/2)/r(7/2) (where T{x) is 
the standard Gamma function), (t)p{z) is the probability dis- 
tribution function of the redshift of an object in the photo- 
metric sample, D{z) is the angular size distance, and l{z) is 
the comoving distance to redshift z- Hence, to recover (i>p{z) 
from Wsp, we also must know the basic cosmology (to deter- 
mine D{z) and dl/dz), as well as the cross-correlation param- 
eters, ro,.5p and jsp- It has been shown that uncertainties in 
cosmolo gical parameters have minimal effect on the recovery 
of c6^fz) (lNewma"rill2008h . To determine the cross-correlation 
parameters, we use the assumption of linear biasing, under 
which the cross-correlation is given by the geometric mean of 
the autocorrelations of the two samples, ^.vp(r) = (^j^pp)''^^. 
Thus we need to measure the autocorrelation functions for 
each sample and determine their parameters, ro and 7. 

3.1. Autocorrelation of the Spectroscopic Sample 

We first need to determine how the real space autocorrela- 
tion function of the spectroscopic sample, ^ss, evolves with 
redshift. To do this we bin the spectroscopic objects in red- 
shift and measure the two-point correlation function as a func- 
tion of projected separation, rp, and line-of-sight separation, 
TT, for the objects in each bin. However, since it is affected 
by redshift-space distortions in the line of sight direction, it is 
difficult to measure the evolution of ^js(r) accurately directly 
from the observed ^(rp, tt). However, as we describe later, we 
can use ^(rp,7r) to derive the projected correlation function, 
Wp{rp), which is not significantly affected by redshift-space 
distortions. The evolution of the projected correlation func- 
tion with redshift can be related to the evolution of ^(r). 

To begin we measure in bins of r„ and tt , using the 
Landy & Szalay estimator (lLandv&Szalavlll993h : 

DD I —\ -2DR I — I +RR 
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where DD, DR, and RR are the number of object pairs in each 
bin of rp and tt - i.e., the number of cases where an object 
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of type B is located a separation of rp and tt away from an 
object of type A - considering pairs between objects in the 
data catalog and other objects in the data catalog, between 
the data catalog and a random catalog, or within the random 
catalog, respectively; we will describe these catalogs in more 
detail shortly. Here A^d andA^^ are the total numbers of objects 
in the data and random catalogs. For each object pair, we 
calculated the projected separation, rp, and the line-of-sight 
separation, tt, using the equations: 



D{z,r 



and 
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dz 



(5) 
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where z\ and Z2 are the redshifts of the two objects in a pair, 
/S.9 is their angular separation on the sky, and Zmean — {z\ + 

We calculate DD by measuring the transverse and line-of- 
sight distance between every pair of objects in the data sample 
and binning those distances to find the number of pairs as a 
function of rp and tt. In this case the data sample is all of 
the objects in the chosen spectroscopic z-bin. In turn, RR is 
the pair count amongst objects in a "random" catalog, and 
DR is the cross pair count calculated using pairs between data 
objects and random catalog objects. We construct the random 
catalog to have the same shape on the sky as the data catalog, 
but its objects are randomly distributed with constant number 
of objects per solid angle (taking into account the spherical 
geometry). 

To measure the real space correlation function, the random 
catalog must also have the same redshift distribution as the 
data catalog. To produce this, we first determine a smooth 
function that fits the overall redshift distribution of the spec- 
troscopic sample and construct the random catalog to match. 
We had difficulty finding a single function that fit the entire 
distribution of R < 24.1 galaxies in the Millennium mock 
from z = 0. 1 to z = 1 .5, so we used different functional forms 
over different redshift ranges. The best fit resulted from using 
0,(z) - z^&x^{-z/zo) for < z < 1.03 and 0,(z) ~ A(l + z)'^ 
for z > 1.03. We bin the objects in each field into bins of 
Az = 0.04. Combining the distributions of all 24 fields and 
fitting via least-squares gave values of Zo — 0.232 ± 0.003 and 
/3 — -2.74 ±0.18. We then used these values, choosing a 
value of A to force continuity at z = 1.03, to define the red- 
shift distribution used to generate the random catalogs. The 
random catalog for each field contained ^10 times the num- 
ber of objects as its corresponding data catalog. 

After constructing the random catalogs, we calculate the 
pair counts in each redshift bin. For each field, both the 
data and random catalogs are divided into subsamples ("z- 
bins") according to their redshift, and DD, DR, and RR 
are calculated for each bin of rp and tt using only objects 
within a given z-bin. In the rp direction we binned the sep- 
arations in log(rp) over the range -3 < log(rp) < 2.5 with 
Alog(rp) = 0.1, where rp is in /I'^Mpc. In the tt direction 
we binned the separations over the range < tt < 30 /i~^]VIpc, 
with Att = 1.0 /i~'Mpc. We calculated the pair counts in 10 
z-bins covering the range 0.11 < z < 1.4, where the size and 
location of each z-bin was selected so that there were approx- 
imately the same number of objects in each one. 



When interpreting correlation measurements for the spec- 
troscopic sample, we mu st take into acco unt the effects of 
redshift-space distortions (lHamiltonlll99 8h. Since these only 
affect distance measurements along the line of sight, we in- 
tegrate ^(rp,7r) in the tt direction, which gives the projected 
correlation function, Wp{rp). Modeling (^(rp,7r) as a power 
law and solving for Wp{rp) analytically gives 



Wp{rp)=2j^^^[{rl + n^y/^]d7r 



(7) 



(8) 



where //(7) is defined following equation [3] We thus can 
recover 7jj(z) and r^ssiz) by fitting a power-law model to 
Wp{rp) in each z-bin, allowing us to measure how the cor- 
relation function evolves with redshift. Because for our field 
geometry, signal-to-noise is poor at large scales, we fit for 
Wp{rp) up to rp — 10 /i~^Mpc. The lower limit of rp used 
for the fit varied with redshift. We found in the highest red- 
shift bins the behavior of Wp{rp) diverged from a power law, 
likely due to the semi-analytic model not populating group- 
mass halos with enou gh blue galaxies compared to DEEP2 
data / Coil et allboOSh . Hence, for z < 0.8 we fit over the 
range 0.1 < rp < 10 /!~'Mpc, while for z > 0.8 we fit over 
1.0< rp < lO/i-'Mpc. 

We cannot measure ^(rp,7r) to infinite line-of-sight separa- 
tions, so to calculate Wp{rp) we must integrate ^(rp,7r) out to 
TTmn.r = 30 /z~'Mpc and then apply a correction for the frac- 
tion of the integral missed. In fact, in measuring Wp{rp), in- 
stead of evaluating £_{rp,TT) and then integrating, we simply 
summed the paircounts in the vr direction so DD, DR, and RR 
are functions of rp only; this method yielded more robust re- 
sults. From equation|7](integrating to iTmax instead of infinity) 
we find 



Wp{rp) 



\Nd 



-2DR 



Nr 
Nd 



+RR 



(9) 

where DD, DR, and RR are the paircounts summed over the 
vr direction. For the correction, we first calculate Wp(rp) by 
summing the pair counts out to 7T,„ax, and then fit for ro and 
7 using the analytic solution given in equation[8] Using those 
parameters, we calculate Jq"""^ £_{rp,TT)dTT/ £^{rp,7T)d7T. We 
divide the observed Wp(rp) by this quantity and refit for ro and 
7. This process is repeated until convergence is reached. 

3.2. Autocorrelation of the Photometric Sample 

Since we assume the photometric sample contains no red- 
shift information (or, more realistically, that any available 
redshift information was already exploited by placing ob- 
jects into a redshift bin), we determine its autocorrelation 
parameters by measuring the angular autocorrelation func- 
tion, Woo{9), a nd relating it to ro,pp using Limber's equation 
(|Peebleslll980l) : 



{e) = H{^pp)e'-'"':'jjl{z)rl 



D{z) 



dl/dz 



-dz, (10) 



where 7pp may be measured directly from the shape of 
Wpp(9). We again measure the angular autocorrelation of the 
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photometric sample using a Landy & Szalay estimator: 
DD 
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RR 



Nr 
Nd 



-2Dr(—]+RR 

\.NdJ 



(11) 



where DD, DR, and RR are the paircounts as a function of 
separation, 9, and Nd and A^^ are the number of objects in 
the data and random catalogs for the field. For angular cor- 
relation measurements the random catalog consists of objects 
randomly distributed on the sky in the same shape as the data 
catalog. Again, the random catalog is 10 times larger than 
the data catalog. For each sample, we calculated the 6 sepa- 
ration of every pair and binned them in log (6*) over the range 
-3 < \og{9) < 0.4 with Alog(0) = 0.1, where 9 is measured 
in degrees. 

The angular correlation function can be related to the spatial 
correlation function: Wpp{9) =App9^~'^i''\ where App ^ rQ^'^ 
(|Peebleslll980l) . However, since the observed mean galaxy 
density in a field is not necessarily representative of the global 
mean density, our measurements of Wpp{9) need to be cor- 
rected by an additive factor known as the integral constraint. 
To estimate this, we fit Wpp{9) using a power law minus a con- 
stant, e.g. Wpp{9) ~ App9 ^^pp -Cpp, where Cpp is the integral 
constraint. For measuring the parameters we fit over the range 
O.OOr < 9 < O.r. We found that fitting over this smaller 
range reduced the error in the amplitude measurements, al- 
though the error in the integral constraint (which is essentially 
a nuisance parameter) increases. For autocorrelation measure- 
ments this has little impact. We use the measured 7pp, along 
with the parameters of the spectroscopic sample {^ss{z) and 
f(),ss{z)) and an initial guess of ro.pp to determine an initial 
guess of r^'^p, employing the linear biasing assumption that 

SIsp _ ( Is, Ipp Nl/2 

'^O.ip ~ VO.ss'^O.pp) 

We expect the correlation length of the photometric sam- 
ple, ro.pp, to be a function of redshift, as both the underlying 
dark matter correlation function and the large-scale structure 
bias of the sample will evolve with z, both in the real universe 
and in our mock catalogs. To account for this, we assume the 
redshift dependence of the scale length, ro, will be similar for 
both the photometric and spectroscopic samples (we consid- 
ered several alternatives, but this yielded the best results); for 
our calculations we set ro pp{z) °^ ro ss{z), with an initial guess 
of ro pp{z) = ro ss{z)- We then refine our initial guess for r^'!!^ 
by measuring the angular cross-correlation function in each 
redshift bin. 

3.3. Cross-correlation and 4>p{z) 

To find Wsp{9,z), we measure the cross-correlation between 
objects in spectroscopic z-bins with all objects in the pho- 
tometric sample. We bin the spectroscopic sample over the 
range 0.19 < z < 1.39 with a bin size of Az = 0.04 and mea- 
sure Wsp{9) for each bin using the estimator 
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Nd, 
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(12) 



where D^Dp, D^Rp, RsDp, and RgRp are the cross pair counts 
between samples as a function of 9 separation, and is the 



number of objects in each sample. The cross pair counts are 
calculated by measuring the observed number of objects from 
one sample around each object in another sample. For exam- 
ple, DsDp is the number of objects in the photometric sample 
around each spectroscopic object as a function of separation. 
For this measurement, each sample (the objects in the spec-z 
bin and the photometric sample) has their own random cat- 
alog that is ^ 10 times bigger than their corresponding data 
catalog. These are once again constructed by randomly dis- 
tributing objects on the sky in the same shape as the data cat- 
alog. 

For each z-bin we measured Wsp{9) in logarithmic bins of 
0.1 in log(6') over the range -3 < \og{9) < 0.4, with 9 mea- 
sured in degrees. As with the autocorrelation function, we 
fit Wsp{9) — Asp9^~'''P -Csp', the integral constraint is nonneg- 
ligible in these measurements. Again we fit over the range 
0.001° < 9 < 0.1° to reduce the error in the amplitude mea- 
surements. In some z-bins, particularly where the amplitude. 
Asp, is small, we found a significant degeneracy between A^p 
and -fsp when fitting. One can understand this as there being 
a pivot scale at which clustering is best constrained; one can 
simultaneously vary Asp and jsp and still match Wsp at that 
scale. To remove this degeneracy, we fixed jsp in each bin, 
and only fit for the amplitude and integral constraint. Since 
the clustering of the samples with each other is expected to 
be intermediate to the intrinsic clustering of each sample, we 
estimated jsp with the arithmetic mean of 7pp and 7.„.. Using 
Asp and ^sp, as well as the initial guess for Tq jp, we determine 
an initial guess of the redshift distribution (t>p{z)- Rewriting 
equation[3]gives 



dl/ dz 



(13) 



We then use the resulting <f>p{z), along with App and 7pp, to 
redetermine ro,pp using Equation [TOl which we use to rede- 
termine and thus 0p(z). This process is repeated until 
convergence is reached. 



4. RESULTS 

For the remainder of the paper, we will frequently refer 
to making a "measurement" of the correlation functions and 
0p(z). Each measurement is done by selecting four fields 
at random out of the 24 mock catalogs, summing their pair 
counts, and calculating all necessary quantities; no informa- 
tion on 'universal' mean values of any measured quantity is 
used, but rather only that available from the chosen four fields. 
We select four fields in order to emulate redshift surveys like 
DEEP2 and VVDS, in which data is typically obtained from 
of order four separate fields; hence a 'measurement' in our 
parlance is roughly equivalent to utilizing the information 
coming from a single survey. To obtain the following results, 
we made 10'* measurements; we used the median values to 
evaluate statistical biases in a given quantity and the standard 
deviation to evaluate random uncertainties. In each plot fol- 
lowing the points are the median values and the error bars are 
the standard deviations, which gives the error on a single mea- 
surement. Because (given the large number of measurements) 
these medians should closely match the mean of the 24 fields, 
the standard error in a plotted point should be smaller than the 
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Fig. 2. — The median value of W* measurements of the projected two-point congelation function of the spectroscopic sample, n'p(rp), in each redshift bin. 
Each measurement is made by averaging the paircounts of four fields selected at random from the 24 total fields. Error bars show the standard deviation of 
the measurements; i.e., they indicate the expected eiTors from a spectroscopic survey of four 1 square degree fields. The standard eiTor in the plotted points 
is smaller than these error bars by a factor of \/6 (2.45). At high redshift M'p(rp) deviates from a power law, whereas observed samples do not, due to the 
semi-analytic model not containing enough blue galaxies in group-mass halos. The solid line depicts a power-law model for Wp{rp), using the median values of 
the fit parameters ro,.,, and 7,, across the lO"* measurements. The dashed line is the same in all panels; it is included to help make changes in the slope (i.e., 7sj) 
and the amplitude (i.e., ro,si) with redshift clearer. We can see that changes in the amplitude with redshift are much more significant than changes in the slope. 



plotted error bars by a factor of V6. 

It should be noted that we are ignoring the weak cross 
correlation that shoul d result from gr a vitational lensing by 
larg e-scale structure dNewmanl 120081: iBernstein & Huteren 
120101) . These correlations can be predicted directl y 
from galaxy number counts JScranton et al.l l2005h : 
planned surveys such as LSST will extend fainter 
than their nominal depth over limited regions of sky 
dLSST Science Collaborations: Paul A. Abell et alj|2009h . so 
no extrapolation will be required. It should also be possible to 
use the initial estimate of (j>p (z) to predict the lensing induced 
cross-correlation signal at a given redshift, and therefore 
iteratively remove its contribution. Because these correlation 
effects are weak, straightforward to deal with, and not present 



in the mock catalogs available to us, we do not consider them 
further here. 

To determine the evolution of the autocorrelation parame- 
ters of the spectroscopic sample we measured Wp{rp) in z-bins 
of varying widths. Fig. |2]shows the median and standard de- 
viation of Wp{rp) for 10^ measurements in each spectroscopic 
z-bin, with the correction for finite 7r,„„r applied as described 
above. We then fit each measurement of Wp{rp) for the au- 
tocorrelation parameters. The solid lines in Fig. |2]show the 
results of equation[8]corresponding to the median ro ^s and 7^^ 
for all measurements in a given z-bin, while Fig. [3] shows 
the accuracy with which we can measure the evolution of ro^ss 
and with redshift. Both parameters decreasing with red- 
shift is consistent with measurements in real samples which 
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Fig. 3. — The correlation function parameters resulting from power-law fits 
to Wp(rp), rojs and 7„, as a function of redshift. The points ai'e the median 
values of lO'* measurements, and hence correspond to the parameters used 
to generate the lines in Fig. |2l the error bars are the standard deviation of 
each parameter amongst the measurements. The standard error in the plot- 
ted points is smaller than these error bars by a factor of (2.45). Each 
measurement is made by averaging the paircounts of four fields selected at 
random from the 24 total fields. While both pai'ameters decrease with red- 
shift, we see that changes in ro ss are substantially greater than changes in 



Fig. 4. — The median value of 10'* measurements of the two-point conge- 
lation function of the photometric sample, Wpp(8), connected for the integral 
constraint. Each measurement is made by averaging the paircounts of four 
fields selected at random from the 24 mock catalogs. Error bars show the 
standard deviation of the measurements. The standard error in the plotted 
points is smaller than these error bars by a factor of \/6 (2.45). The solid fine 
is the fit to Wpi,{8) using the median values of the fit parameters App and ■ypp; 
a power-law model provides an excellent fit to the data. 



show bluer galaxy samples have smaller ro and 7 (ICoil et all 
l2008h : a constant observed magnitude limit will correspond 
to a selection at a bluer and bluer rest frame band as redshift 
goes up, increasingly favoring bluer objects for selection. 

The autocorrelation parameters for the photometric sam- 
ple are determined from the shape of Wpp{9). Fig. |4] shows 
the median and standard deviation of 10"* measurements of 
Wpp{9), corrected for the integral constraint. A fit to each 
measurement gives estimates of autocorrelation parameters. 
Taking the median values and standard deviations gives App — 
5.48 X 10-'*±2.73 x lO-"* and jpp ^ 1.55±0.045. The solid 
line in Fig. |4]corresponds to these median values. The scale 
length of the photometric sample, ro pp{z), was assumed to be 
proportional to ro „ this yielded superior results to other 
simple assumptions. The proportionality constant may then 
be found using an initial guess of ro.pp = ro to calculate 
0p(z) using cross-correlation techniques, leading to a refined 
estimate of ro^pp using Limber's equation (eqn. [TOl i. That re- 
fined ro pp is then used to make an improved measurement 
of 4>p{z), which is used to obtain a yet-improved measure of 
rQ pp, etc. After convergence was reached, we found that on 
average ro,pp/ro„,,, = 1.068. 

To determine the evolution of the cross-correlation param- 
eters, we measure the angular cross-correlation, Wsp{0,z), be- 
tween objects in successive spectroscopic z-bins and the pho- 
tometric sample. Fig. |5]shows the median and standard devi- 
ation of Wsp{d) for 10^ measurements in each z-bin, corrected 



for the integral constraint. Fitting each measurement for the 
cross-correlation parameters with fixed 7,^ as described above 
and taking the median gives the amplitude, Asp{z), shown in 
Fig. |6] The solid lines in Fig. |5]correspond to the median of 
the best-fit parameters from each measurement. 

Combining the intrinsic clustering information from the au- 
tocorrelation parameters of each sample with the amplitude of 
the cross-correlation, Asp{z), together with the basic cosmol- 
ogy, gives the recovered redshift distribution. We found that a 
linear fit of ro ^s and 7,5 versus z resulted in a better recovery 
of 4)p{z) than using each bin's value directly, resulting in a 
~ 32% reduction in the of the final reconstruction as com- 
pared to the true redshift distribution. Fitting the correlation 
function over a limited 9 range, as described in § 13.31 reduced 
the measured error in (l>p{z) for each z-bin by ~ 25% on aver- 
age, reducing the in comparing the reconstructed and true 
redshift distributions by ^ 30%. We also tried modeling 7,^ 
as constant with z using the arithmetic mean of 7.5.5 (z = 0.77) 
and 7pp. This resulted in a ^ 20% increase in the of the 
final fit. 

Fig. |7] shows the median and standard deviation of 10^ 
measurements of (j>p{z) compared to the actual distribution. 
To determine the actual distribution, we found the mean true 
distribution of the four fields corresponding to each measure- 
ment and took the median across the 10* measurements; this 
should accurately match the true mean of the redshift distribu- 
tions over the 24 fields. Each measurement was normalized so 
that integrating (f)p (z) over the measured redshift range gives 
unity before the median was taken. It is important to note 
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Fig. 5. — The median value of lO'* measurements of the cross-coiTelation between the photometric and spectroscopic samples, w,y,(9), in each redshift bin, 
corrected for the integral constraint. Each measurement is made by averaging the paircounts of four fields selected at random from the 24 total fields. Error bars 
show the standard deviation of the measurements. The standard eiTor in the plotted points is smaller than these en'or bars by a factor of \/6 (2.45). The solid 
line is the fit to w,p{9) using the median values of the fit parameters A,,, and ■y^p. The dashed line is to help make changes in the amplitude. Asp(z), with redshift 
clearer; in the fits shown the slope, 7j,,(z), is forced to be constant with z.. It is clear that the amplitude of the correlation is much greater in the central region of 
the redshift range where there are more photometric objects. The error on Wsp{9) does not vary strongly with redshift, but rather the errors appear larger where 
there are few objects as a consequence of plotting on a logarithmic scale: i.e., the amplitude of the correlation is smaller in those regions, which leads to a much 
larger fractional error in w,p{6), and hence much larger error in log(Wsp), where w^p is small, even though the error in w^p itself remains unchanged. 



that the reconstruction techniques we have implemented thus 
far will recover the actual redshift distribution of objects in 
the photometric sample. This will in general deviate from the 
true, universal redshift distribution of objects of that type due 
to sample/cosmic variance. We describe and test methods for 
recovering the underlying universal distribution in ^4.21 

We also looked at how well redshift distributions may be 
recovered in a single, 1 square degree field. For each field, 
the correlation functions were calculated using only the infor- 
mation from that field. To weight each bin when fitting for 
correlation-function parameters, the fit was calculated using 
errors given by the standard deviation of the correlation func- 
tion in each 6 bin over the 24 fields. This mimics the common 



situation where we have few fields with data and errors are de- 
termined from simulations. For a single field, a linear fit for 
the evolution of the spectroscopic-sample correlation function 
parameters was not a good model, so we used the calculated 
parameters in each z-bin. Fig. [8] shows the recovered distri- 
bution, (j)p{z), in each of the 24 fields, compared to the true 
redshift distribution of the photometric sample in that field. 

4.1. Correlation Measurement Errors 

In the course of our calculation of the redshift distribution, 
we found that the error in 0,,(z) for each redshif t bin was 
larger than expected from the error model used in iNewmarJ 
(2008), which uses the standard, classical weak-clustering 
formalism. This formalism predicts that Poisson uncertainties 
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Fig. 6. — The median value of 10^ measurements of A;p, the amphtude 
of Wsp, in each redshift bin. Each plotted point corresponds to the amplitude 
of one of the model lines shown in Fig. \5\ Each measurement is made by 
averaging the paircounts of four fields selected at random from the 24 mock 
catalogs. Error bars show the standard deviation of the measurements. The 
standard error in the plotted points is smaller than these error bars by a factor 
of \/6 (2.45). The amplitude is larger in the central region of the redshift 
range where there are more photometric objects, which is expected since the 
degree to which the two samples overlap in redshift contributes to the strength 
of the cross-correlation function. 



Fig. 7. — Plot of the redshift distribution recovered using cross-coiTelation 
techniques. The solid line is the actual distribution of the photometric sam- 
ple (combining all 24 fields), while the points are the median reconstructed 
values from 10* measurements. En'or bars show the standard deviation of the 
recovered distribution when performing cross-correlation reconstruction in 4 
0.5 X 2 deg fields, emulating the data available from existing deep redshift 
surveys. The standard error in the plotted points is smaller than these error 
bars by a factor of \/6 (2.45). Each measurement is made by averaging the 
paircounts of four fields selected at random from the 24 mock catalogs. The 
recovered distribution follows the true disfiibution closely, even picking up 
the in'egular dip due to sample variance (also known as cosmic variance) at 
the peak. 



should dominate when the clusteri ng strength (e. g. the value 
of Wsp) is small compared to unity (|Peebleslll98di) . Upon fur- 
ther investigation we determined that the error in all corre- 
lation function measurements were larger than expected ac- 
cording to this model, which led to the excess error in 4>p{z). 
This additio nal error is associated with extra variance terms 
identified bv lBernsteinI (1 19941) . which contribute significantly 
even in the weak-clustering limit, contrary to the classical as- 
sumption. These extra terms are dominated by the variance in 
the integral constraint, which has a significant impact if spec- 
troscopic samples cover only a few square degrees of sky. 

Fig. |9] compares the four terms of the predicted er- 
ror from Bernstein's error model to our measured error for 
Wpp{0). Bernstein's error model assumes the separation is 
much smaller than the field size, so we see for small 9 the 
predicted variance does follow our measured variance closely, 
and then deviates as the separation becomes comparable to 
the field size. The integral constraint term dominates at large 
6 values. In order to calculate some of the variance terms of 
Bernstein's model we required values for ^3 and ^4, which are 
used to relate the three- and four-point correlation functions to 
the two-point correlation function assuming hierarchical clus- 
tering. For this we used the values measure d by Bernstein i n 
simulation catalogs, ^3 — 0.32 and ^4 = 0.1 (lBernsteinll994l) . 
This gave a better fit to our results than the values observed 
in loc al galaxy samples (.Meiksin et al.l 11992: Szapudi et al.l 
Il992h . 



From Fig. |9]we see that the measured variance can be or- 
ders of magnitude larger than errors predicted using the weak- 
clustering assumption (though the difference is a smaller fac- 
tor for Wsp, whose errors dominate in reconstructing (t)p{z))- 
This excess variance will have a significant impact on the 
error budgets of planned dark energy experiments (see the 
next section for quantitative estimates); it is dominated by 
the variance in the integral constraint, whose effect increases 
with decreasing field size, so errors may be greatly reduced 
by surveying galaxies over a larger area (>^ 100 square de- 
grees i nstead of ^ 4). For instance, the proposed BigBOSS 
survev dSchlegel et"al]|2009 ) would provide a near-ideal sam- 
ple for cross-correlation measurements (using both galaxies 
and Lyman a absorption systems at redshifts up to ^ 3). We 
may also reduce this effect by using better correlation function 
estimators which reduce the effect of the integral constraint. 
One example of such a robust estimator relies on convolv- 
ing the two-point correlation function with a localized filter 
(Padmanabhan et al. 2007). We are currently testing the (t>p{z) 
reconstruction using this estimator to determine its impact on 
the error in our recovered redshift distribution. 

4.2. Error Estimates 

In this subsection, we investigate the impact of these excess 
correlation function measurement errors on our ability to re- 
cover the parameters (i.e. the mean and a) of the true redshift 
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Fig. 8. — Plot of the recovered redshift distribution for each of the 24 fields, using only pair counts from a single field in the reconstruction. The error bars in 
the first plot are the standard deviation of ipp.rec (z) ~ 4'p.aa (z) amongst the 24 fields; they should be representative of the expected error for each panel. For each 
field, all errors used in fitting are based on standard deviations across the 24 fields. This mimics a common situation where we have only one field, but use errors 
determined from simulations to weight points for fitting. The reconstruction generally captures the variation amongst fields due to sample/cosmic variance. 



distribution for the photometric sample, and compare the re- 
sults to Monte Carlo tests done in Newman (2008). For each 
measurement we have a recovered distribution and an associ- 
ated true distribution for that set of four fields. We will test the 
recovery both of the underlying, universal distribution used to 
construct the photometric sample (i.e. (z) — 0.75, — 0.20) 
and of the actual redshift distribution of the objects selected in 
a given set of fields (which will differ due to sample/cosmic 
variance; cf. ® . 

Before we can fit for Gaussian parameters, we must account 
for the fact that our photometric sample has a redshift distri- 
bution which differs from a true Gaussian because the total 
sample we drew from (with Gaussian probability as a func- 
tion of z) was not uniformly distributed in redshift. One can 
think of the actual distribution of the photometric sample in a 
given bin as a product of three factors: the overall redshift dis- 
tribution of all objects in the Universe (essentially, the rising 



curve in[rii; the fractional deviation from the Universal mean 
of the number of objects in a given field at a given redshift, 
i.e. sample/cosmic variance; and the Gaussian function used 
to select objects for the photometric redshift bin. 

The first two factors need to be removed from both the true 
and recovered distributions if we are to test the recovery of 
the third; this is implemented differently for each case. For 
the true distribution, we divide each measurement by the over- 
all dN/dz of all of the objects in the four fields used in that 
measurement. This removes the overall distribution shape as 
well as the fluctuations due to sample variance, and gives a 
true distribution that closely matches the Gaussian selection 
function applied to construct the sample. 

In principle we could do the same for the recovered distri- 
bution, but that would not be practical in real applications, as 
we can determine the overall shape of the redshift distribution 
of the overall photometric sample using photometric redshifts. 
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Fig. 9. — The variance of lO"* measurements of the autocon'elation of 
the photometric sample, Wpp (8) (thick solid line), compared to predicted er- 
ror terms from Bernstein 1994. The thick dashed line shows the sum of all 
the variance terms; it corresponds well to the observed variance save at the 
largest scales, where the Bernstein 1994 model is overly conservative (a con- 
sequence of the assumption made in that work that the angular separations 
considere d are significantly smaller than the size of the field). From equa- 
tion 38 in lBemsteinI il994l) , the thin solid black line is the term that scales 
as w^, corresponding to the variance in the integral constraint, which domi- 
nates at large 8. The thin three-dot-dash line is the term that scales at vv', and 
the thin dot-dash line is the term that scales as 1 /N. The thin dashed black 
line is the term that scales as 1 /N^ and is comparable to the Poisson error, 
which dominates in the weak clustering formalism used by Newman (2008). 
The 'observed' variance in Wpp{6) is much larger than the weak clustering 
prediction; the same is true of Wsp{6), although to a lesser degree. 



but photo-z errors will prevent measuring fluctuations in the 
number of objects within bins of small Az. Hence, we correct 
the recovered (pp{z) using a low-order polynomial fit to the 
shape of the overall sample's dN/dz, but use the fluctuations 
(compared to a smooth fit) in the observed redshift distribu- 
tion of the spectroscopic sample dNs/ dz, which will be known 
from the same observations used to perform cross-correlation 
measurements, to correct for sample variance. This correction 
assumes that deviations from the mean in both samples be- 
have similarly with redshift; we might expect their amplitude 
to scale with the large-scale-structure bias of a given sample, 
but we do not apply any correction for that here. In tests, we 
have found that a correction using fluctuations in dNs/dz was 
as effective in constraining parameters as one based on fluc- 
tuations in the dN/dz of the overall sample our photometric 
subsample was selected from, and so we focus on the former, 
more realistic technique. 

In more detail, we first divided the recovered distribution 
by a smooth fit (using a 5th-degree polynomial function) to 
the overall dN/dz of the entire simulation averaged over all 
24 fields. This eliminates gradients associated with the shape 
of the parent sample's overall redshift distribution without 
removing deviations due to sample variance. To correct for 



the latter, we need to quantify the fluctuations in the spectro- 
scopic sample relative to a mean distribution. For this smooth, 
mean distribution, (dNs/dz), we used the same fit to the red- 
shift distribution of the spectroscopic sample averaged over 
all 24 fields which was employed to construct the random cat- 
alogs for autocorrelation measurements 03.11 ). Using a fit to 
a given set of four fields would make little difference, as the 
deviation from the smooth fit at a given redshift bin due to 
sample variance are much larger than the deviations between 
the smooth fit to 4 or 24 fields. We then calculate the ratio 
dNs/dz/ (dNs/dz), where dNs/dz is the redshift distribution 
of the spectroscopic sample averaged over the four fields used 
in that measurement, and correct for sample variance by di- 
viding each measurement of (j>p (z) by this quantity. 

After applying these corrections to each distribution, each 
measurement is normalized so that their integral is unity, and 
then fit for (z) and cr, using a normalized Gaussian fitting 
function. Fig. [10] shows the median and standard deviation 
of 10^ measurements of the recovered (j)p{z) before and af- 
ter correcting for sample variance. In both plots the fit to the 
overall dN/dz is divided out. It is clear to the eye that the 
distribution corrected for sample variance is a better fit to the 
underlying selection function; more quantitatively, it reduces 
errors in determining the parameters of the Gaussian selection 
function by ~ 10%. 

We assess the reconstruction of the photometric sample in 
two ways. First, we compare the reconstructed parameters, 
(z) and (7,, of the Gaussian selection function to the true val- 
ues, known by construction. Second, we compare the recon- 
structed parameters of the selection function to the parameters 
of a Gaussian fit to the actual normalized distribution of each 
set of four fields used. The latter method should be more ro- 
bust to systematic errors in the 'true' dN/dz we divide each 
measurement by. 

For the first test, where (z)true = 0.75 and (T^^me — 0.20, 
we find ((z)rec-(z),rue) = 7.796 X 10-'*±7.415 X 10"^ and 
(<7z,rec-(^z.true) = 8.140 X 10"'*±8.545 X 10"^, whcrc as usual 
the values given are the median and standard deviation of all 
measurements, respectively. The second test, where (z)inie 
and <Jz,inie are determined by a Gaussian fit to the true dis- 
tribution of each measurement, we find ((z)rec - (z)tnie) — 
7.259 X lO-'^i 7.465 x 10"^^ and (a.^rec- cr^rue) = 4.724 x 
lO"'* ± 8.546 X 10-^ In all cases, the bias is not statistically 
significant (the standard error against which each bias esti- 
mate must be compared is smaller than the quoted standard 
deviations by a factor of V6), but in any event the overaU bias 
of both parameters is considerably smaller than the associated 
random errors, and will therefore have little effect when added 
in quadrature. These errors are still larger than the estimated 
requirements for future surveys (i.e. cr ^ 2-4 x 10"^, as de- 
scribed in flj. For cross-correlation techniques to meet these 
requirements, this excess error will need to be reduced. We 
discuss a few options for this in 34.11 

A number of choices we have made on how to model and 
measure correlation function parameters (e.g. using a fit for 
the dependence of the spectroscopic sample's autocorrelation 
parameters on z vs. using the values for a given z-bin di- 
rectly; assuming ro pp oc r^ ss vs. a constant ro pp; or allowing 
7sp(z) to decrease with redshift vs. forcing a constant jsp) 
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Fig. 10. — Plots of the recovered and mean true redshift distribution of the 24 fields, after the overall redshift distribution of all galaxies in the mock catalogs, 
dN /dz, is divided out, as described in j|4.2| On the left is the reconstruction before applying a correction for sample/cosmic variance based on fluctuations in 
the spectroscopic redshift distribution in the fields observed, and on the right is the reconstruction after that correction. There is a significant improvement in 
the reconstruction. The plot on the light coiTesponds to the reconstruction of the probabihty an object falls in the photometric redshift bin as a function of its 
true z (or, equivalently, the reconstruction of the photometric redshift error distribution), rather than reconstructing the actual redshift distiibution (affected by 
sample/cosmic variance) of galaxies in a particular set of fields, as was depicted in Fig. [7] The solid line in each panel is the true normalized distribution of the 
photometric sample and the points are the median values of 10'* measurements. The true distribution matches the Gaussian selection function used for creating 
the photometric sample, by construction. En'or bars show the standard deviation of the recovered distiibution. The standard error in the plotted points is smaller 
than these error bars by a factor of (2.45). Each measurement is made by averaging the paircounts of four fields selected at random from the 24 total mock 
catalogs available. As shown here, if we know the amplitude of fluctuations from cosmic vaiiance at a given redshift (using the variance in the distribution of 
spectroscopic galaxies), as well as the overall distribution of the parent sample (e.g. from combining redshift distributions from all photometric redshift bins), we 
can accurately reconstruct the true selection probability distribution. 



can affect both the bias and error in these measurements. We 
have tested reconstruction with alternate methods to those de- 
scribed here and found that the random errors in (z) and 
are much more robust to these changes than the bias. When 
varying the three correlation parameters as described previ- 
ously, the standard deviation of the measurements never var- 
ied by more than ^ 10%, but the bias in some cases increased 
significantly. For measurements of (z) , the alternative param- 
eter models yielded biases of 0.006-0.009, making them sta- 
tistically significant compared to the random errors. For cr,, 
the biases under the different scenarios were of similar order 
of magnitude as our standard method, except for the case of 
using the measured values for the spectroscopic correlation 
function parameters (ro and 7) in each z-bin instead of a fit. 
This yielded a bias in of ^ -0.009. From this we see that 
the methods used to measure correlation parameters need to 
be considered carefully, since inferior methods can cause the 
bias to become comparable to random errors. 

From equation 13 in Newman (2008). the predicted errors 
in (z) using the weak clustering formalism are essentially 
identical to the errors in a^ ; that is true to ^ 20% in our results. 
This error is a function of a^, as well as the surface density 
of photometric objects on the sky, Ep, the number of objects 
per unit redshift of the spectroscopic sample, dNs/dz, and the 
cross correlation parameters, j^p and ro ^p- We use the mean 
values of these parameters from our catalogs and find that the 



predicted error on both parameters is cr = 1.064 X lO-\ This 
is considerably smaller than our measured error, which is not 
surprising given the extra error terms in the correlation func- 
tion discussed in 34.11 

Our analysis throughout this paper has considered the case 
of a single-peaked, Gaussian selection function for placing 
objects in a photometric bin. However, different distributions 
would yield similar results, as the error in the recovery of 
0p(z) at a given redshift depends primarily on the charac- 
teristics of the spectroscopic sample and th e ove rall size of 
the photometric sample, but not (f)p{z} itself dNew man 2008). 
We illustrate this in Fig. (TT] where we have applied the same 
analysis techniques described above (and laid out in the recipe 
in © for a selection function that consists of two equal- 
amplitude Gaussian peaks centered at z = 0.5 and z — 1.0, 
each with <t^ =0.1; this figure can be compared to the right 
panel of Fig. [10] We note that, since in this scenario the ob- 
jects selected are less concentrated in redshift, the effects of 
bias evolution (as predicted by the semi-analytic models used) 
should be greater here than in our standard case, but our re- 
covery remains accurate. 

5. CONCLUSION 

Section |3] has described in detail the steps we took to re- 
cover the redshift distribution, (f>p{z), of a photometric sample 
by cross-correlating with a spectroscopic sample of known 
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Fig. 1 1 . — Results of cross-correlation reconstruction of a selection func- 
tion consisting of two equal-amplitude Gaussian peaks centered at z = 0.5 
and z = 1.0, each with ct; = 0. 1 . The solid line is the true distribution of 
the photometric sample (combining all 24 fields), while the points are the 
median reconstructed values from 10* measurements. Error bars show the 
standard deviation. The standard error in the plotted points is smaller than 
these error bars by a factor of \/6 (2.45). Each measurement is made by 
averaging the paircounts of four fields selected at random from the 24 mock 
catalogs. This plot is analogous to the right panel of Fig. 1101 as in that case, 
we are reconstructing the selection function of the sample rather than its red- 
shift distribution. The effects of bias evolution should be greater in this case, 
however, as the sample is less concentrated in redshift. The recovery remains 
accurate here, despite the larger bias evolution and very different </>p(z). 



redshift distribution. We will now summarize the procedure 
used to make this calculation, to facilitate its application to 
actual data sets. 

• Obtain the necessary information for each sample; RA, 
dec and redshift for the spectroscopic sample, and RA 
and dec for the photometric sample. 

• Create the random catalogs for each sample. ( ^3.1H3.3) 

• Calculate the data-data, data-random, and random- 
random paircounts for each correlation function. 

• For Wp{rp): bin the spectroscopic sample and its corre- 
sponding random catalog in redshift. In each spectro- 
scopic z-bin, calculate Ar^ and Att for each pair and bin 
the pair separations into a grid of log(rp) and tt. Then 
sum the paircounts in the tt direction. ( ^3.11 ) 

• For Wpp{9): using the sample and its random catalog, 
calculate A6 for each pair and bin the pair separations 
into log(6') bins. (ES 

• For Wsp{9,z): bin the spectroscopic sample and its cor- 
responding random catalog in redshift. For each spec- 
troscopic z-bin, calculate the pair separations, A0, for 
pairs between the and 'p' samples and their random 
catalogs and bin them into log(6') bins. 03.3l l 



• Use the paircounts to calculate the correlation functions 
using standard estimators (e.g. Landy & Szalay). ( ^3.1I ' 
113) 

• Calculate the parameters of Wp(rp) {ro,ss{z),jss{z)) and 
Wpp{0) {App,^pp) by fitting as described above. ( ^3.1t ' 

US) 

• Use the autocorrelation parameters along with an ini- 
tial guess of rQ^pp (e.g. ro.pp ro,.s,s) to calculate J^,(z) = 

i''l)l/l)'pp)'^^^- (ED This gave a more accurate reconstruc- 
tion of (l)p{z) (reducing by 33%) than the assumption 
^(i.pp = constant; in fact, a calculation of ^pp{r) from the 
simulation sample directly showed pp to have similar be- 
havior to ro^ii. Using a linear fit of n),ss{z) and 7ii(z) re- 
duced by ~ 32% compared to utilizing the noisier re- 
constructed values in each z-bin. 

• Estimate ^sp — {iss + lpp)/^- Using this 74;,, calculate 
the amplitude, Asp{z), of Wsp{9,z) by fitting as described 
above. (E3 We fit over the range 0.001° < 6* < 0.1°. 
We found that fitting over this smaller 9 range resulted in 
smaller errors in the amplitude, Asp{z), which reduced the 
error in 4)p{z) for each z-bin by ~ 25% on average. We 
fix because of degeneracies between ^^p and A^p when 
fitting them simultaneously. This degeneracy is especially 
strong in regions where (i)p{z) is small. We also tried mod- 
eling 7jp as constant with z using the arithmetic mean of 
744(2 ~ 0.77) and 7^^; however, that method increased the 

of the final fit by - 20%. 

• Combining the results of the last two steps and the as- 
sumed cosmology, calculate (t>p{z) using equation [T3l 

( ^3.3) We also tried calculating (pp [z) using the integrated 
cross-correlation function, w{z), integrating to an angle 
equivalent to a comoving distance r„uix = 10/j-' Mpc (New- 
man 2008); however, that method produced inferior results. 

• Using 4>p{z), along with the calculated App and 7pp, in 
equation [To] gives a new ro pp, which is then used to re- 
calculate '"o^i''p(z). Putting this back into equation [l3] 
gives a new (l)p{z). This is repeated until convergence 
is reached. ( ^3.3) 

• To recover the underlying/universal distribution of ob- 
jects of the type selected for the photometric sample, 
rather than the distribution within the specific fields 
chosen for observation, correct for sample/cosmic vari- 
ance using the fluctuations in the redshift distribution of 
the spectroscopic; i.e., construct a smooth function de- 
scribing the overall redshift distribution of the spectro- 
scopic sample, (dNs/dz), and divide 0p(z) by the ratio 
dN,/dz/{dN,/dz). (E2) 

We have shown in this paper that by exploiting the cluster- 
ing of galaxies at similar redshifts we can accurately recover 
the redshift distribution of a photometric sample using its an- 
gular cross-correlation with a spectroscopic sample of known 
redshift distribution, using mock catalogs designed to match 
the DEEP2 Galaxy Redshift Survey. This test includes the 
impact of reaUstic bias evolution and cosmic variance. Our 
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error estimates for the recovered mean and standard deviation 
of the distribution are larger than those predicted previously, 
but improvements could be obtained either by using more op- 
timal correlation function estimators or by surveying the same 
number of galaxies distributed over a wider area of sky. Based 
on these tests we expect that this technique should be able to 
deliver the performan ce needed fo r dark energy experiments. 

In a recent paper jSchulj [20091) . cross-correlation tech- 
niques were applied to mock data generated by populating 
a single time slice of an N-body dark matter simulation us- 
ing various halo models. They develop a pipeline for calcu- 
lating the redshift distribution of a photometric sample using 
cross-correlation measurements and the autocorrelation of a 
spectroscopic sample, ^jj (r,z). They do not attempt to model 
the bias although they do examine how varying the bias of the 
two samples affects the reconstruction (i.e. using radically 
different halo models). The catalogs constructed to test their 
method are significantly larger in volume than our individual 
mock catalogs, and while the number of objects in their photo- 
metric sample is comparable to ours, their spectroscopic sam- 
ple is much smaller, which would be expected to lead to larger 
errors (Newman 2008), as observed. Another major differ- 
ence is the use of a smoot hness prior in reconstruction, which 
was not done here. While ISchulj (|2009|) found that cross-cor- 
relation techniques were generally successful in reconstruct- 
ing redshift distributions, these conclusions were primarily 
qualitative due to the limited sample sizes and source densities 
of the mock samples used, along with less-optimal correlation 
measurement techniques. In this paper, we have used simu- 
lations which include much less massive halos, allowing us 
to perform quantitative tests of cross-correlation techniques 
using sample sizes and source densities comparable to those 
which will be used in realistic applications. 

Several techniques for calibrating photometric redshifts 
using only photometric data have also been developed 



dSchneider et alj 120061: IZhang et alj 120091: iBeniamin et"al] 
l2010HOuadri & WiUiamsl |2009|) : in general, such techniques 
require priors or assumptions on biasing which can be relaxed 
or tested in spectroscopic cross-correlation measurements. In 
lOuadri & Wilham^ ( |2009|) . spectroscopic/photometric cross- 
correlation techniques have now been applied to real data us- 
ing the COSMOS dataset. Using data from a single field, they 
are able to determine typical photo-z uncertainties well, even 
when ignoring the effects of bias evolution. However, when 
constraining catastrophic photo-z errors, methods which ig- 
nore these effects should break down, as bias evolution should 
be a much greater problem over broad redshift intervals than 
in the core of the photo-z error distribution. 

In future work, we will explore alternate methods of mea- 
suring correlation functions that are invariant to the varianc e 
in the integral constraint (e.g. iPadmanabhan et aT] (l2007h ). 
This should reduce errors in the measurement of the redshift 
distribution, which we found to be larger than expected due to 
extra variance terms in the correlation function measurements 
not considered previously. We also plan to test this technique 
with mock catalogs in which photometric redshifts have been 
'measured' on simulated LSST photometry, rather than sim- 
ply assuming a redshift distribution. We will also apply this 
method to real data using pho tometric and spec troscopic sam- 
ples from the AEGIS survey dOavis et al.ll2007b . 

The authors wish particularly to thank Darren Croton for 
developing the mock catalogs used and making them publicly 
available. We would also like to thank Andrew Hearin, Arthur 
Kosowsky, David Wittman, Michael Wood-Vasey, Andrew 
Zentner, Tony Tyson, Gary Bernstein, Nikhil Padmanabhan, 
Dragan Huterer, Hu Zhan, and in particular Alexia Schulz for 
useful discussions during the course of this work, and also 
Ben Brown and Brian Cherinka for their technical expertise in 
performing these calculations. This work has been supported 
by the United States Department of Energy. 
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